import pandas as pd
from fast_ml.tools import EDA
df = pd.read_csv('house_prices.csv')
df.head()
| Id | MSSubClass | MSZoning | LotFrontage | LotArea | Street | Alley | LotShape | LandContour | Utilities | ... | PoolArea | PoolQC | Fence | MiscFeature | MiscVal | MoSold | YrSold | SaleType | SaleCondition | SalePrice | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 60 | RL | 65.0 | 8450 | Pave | NaN | Reg | Lvl | AllPub | ... | 0 | NaN | NaN | NaN | 0 | 2 | 2008 | WD | Normal | 208500 |
| 1 | 2 | 20 | RL | 80.0 | 9600 | Pave | NaN | Reg | Lvl | AllPub | ... | 0 | NaN | NaN | NaN | 0 | 5 | 2007 | WD | Normal | 181500 |
| 2 | 3 | 60 | RL | 68.0 | 11250 | Pave | NaN | IR1 | Lvl | AllPub | ... | 0 | NaN | NaN | NaN | 0 | 9 | 2008 | WD | Normal | 223500 |
| 3 | 4 | 70 | RL | 60.0 | 9550 | Pave | NaN | IR1 | Lvl | AllPub | ... | 0 | NaN | NaN | NaN | 0 | 2 | 2006 | WD | Abnorml | 140000 |
| 4 | 5 | 60 | RL | 84.0 | 14260 | Pave | NaN | IR1 | Lvl | AllPub | ... | 0 | NaN | NaN | NaN | 0 | 12 | 2008 | WD | Normal | 250000 |
5 rows × 81 columns
eda_report = EDA.generate_report(df, target='SalePrice', model_type='reg')
eda_report.report_title_ = 'EDA Report for House Price Dataset (Regression)'
eda_report.report_user_ = 'Samarth Agrawal (using Fast ML)'
eda_report.show()
(1460, 81) Number of Numerical Variables detected : 38 Number of Categorical Variables detected : 43 Number of DateTime Variables detected : 0 After calibration number of Variables used for Numerical EDA : 20 After calibration number of Variables used for Categorical EDA : 61 Out of 61 Categorical EDA Variables 0 have cardinality of more than 200: [] After calibration number of Variables used for DateTime EDA : 0
924.03 KB
| data_type | data_type_grp | num_unique_values | sample_unique_values | num_missing | perc_missing | |
|---|---|---|---|---|---|---|
| Id | int64 | Numerical | 1460 | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] | 0 | 0 |
| MSSubClass | int64 | Numerical | 15 | [60, 20, 70, 50, 190, 45, 90, 120, 30, 85] | 0 | 0 |
| MSZoning | object | Categorical | 5 | [RL, RM, C (all), FV, RH] | 0 | 0 |
| LotFrontage | float64 | Numerical | 110 | [65.0, 80.0, 68.0, 60.0, 84.0, 85.0, 75.0, nan... | 259 | 17.7397 |
| LotArea | int64 | Numerical | 1073 | [8450, 9600, 11250, 9550, 14260, 14115, 10084,... | 0 | 0 |
| Street | object | Categorical | 2 | [Pave, Grvl] | 0 | 0 |
| Alley | object | Categorical | 2 | [nan, Grvl, Pave] | 1369 | 93.7671 |
| LotShape | object | Categorical | 4 | [Reg, IR1, IR2, IR3] | 0 | 0 |
| LandContour | object | Categorical | 4 | [Lvl, Bnk, Low, HLS] | 0 | 0 |
| Utilities | object | Categorical | 2 | [AllPub, NoSeWa] | 0 | 0 |
| LotConfig | object | Categorical | 5 | [Inside, FR2, Corner, CulDSac, FR3] | 0 | 0 |
| LandSlope | object | Categorical | 3 | [Gtl, Mod, Sev] | 0 | 0 |
| Neighborhood | object | Categorical | 25 | [CollgCr, Veenker, Crawfor, NoRidge, Mitchel, ... | 0 | 0 |
| Condition1 | object | Categorical | 9 | [Norm, Feedr, PosN, Artery, RRAe, RRNn, RRAn, ... | 0 | 0 |
| Condition2 | object | Categorical | 8 | [Norm, Artery, RRNn, Feedr, PosN, PosA, RRAn, ... | 0 | 0 |
| BldgType | object | Categorical | 5 | [1Fam, 2fmCon, Duplex, TwnhsE, Twnhs] | 0 | 0 |
| HouseStyle | object | Categorical | 8 | [2Story, 1Story, 1.5Fin, 1.5Unf, SFoyer, SLvl,... | 0 | 0 |
| OverallQual | int64 | Numerical | 10 | [7, 6, 8, 5, 9, 4, 10, 3, 1, 2] | 0 | 0 |
| OverallCond | int64 | Numerical | 9 | [5, 8, 6, 7, 4, 2, 3, 9, 1] | 0 | 0 |
| YearBuilt | int64 | Numerical | 112 | [2003, 1976, 2001, 1915, 2000, 1993, 2004, 197... | 0 | 0 |
| YearRemodAdd | int64 | Numerical | 61 | [2003, 1976, 2002, 1970, 2000, 1995, 2005, 197... | 0 | 0 |
| RoofStyle | object | Categorical | 6 | [Gable, Hip, Gambrel, Mansard, Flat, Shed] | 0 | 0 |
| RoofMatl | object | Categorical | 8 | [CompShg, WdShngl, Metal, WdShake, Membran, Ta... | 0 | 0 |
| Exterior1st | object | Categorical | 15 | [VinylSd, MetalSd, Wd Sdng, HdBoard, BrkFace, ... | 0 | 0 |
| Exterior2nd | object | Categorical | 16 | [VinylSd, MetalSd, Wd Shng, HdBoard, Plywood, ... | 0 | 0 |
| MasVnrType | object | Categorical | 4 | [BrkFace, None, Stone, BrkCmn, nan] | 8 | 0.547945 |
| MasVnrArea | float64 | Numerical | 327 | [196.0, 0.0, 162.0, 350.0, 186.0, 240.0, 286.0... | 8 | 0.547945 |
| ExterQual | object | Categorical | 4 | [Gd, TA, Ex, Fa] | 0 | 0 |
| ExterCond | object | Categorical | 5 | [TA, Gd, Fa, Po, Ex] | 0 | 0 |
| Foundation | object | Categorical | 6 | [PConc, CBlock, BrkTil, Wood, Slab, Stone] | 0 | 0 |
| BsmtQual | object | Categorical | 4 | [Gd, TA, Ex, nan, Fa] | 37 | 2.53425 |
| BsmtCond | object | Categorical | 4 | [TA, Gd, nan, Fa, Po] | 37 | 2.53425 |
| BsmtExposure | object | Categorical | 4 | [No, Gd, Mn, Av, nan] | 38 | 2.60274 |
| BsmtFinType1 | object | Categorical | 6 | [GLQ, ALQ, Unf, Rec, BLQ, nan, LwQ] | 37 | 2.53425 |
| BsmtFinSF1 | int64 | Numerical | 637 | [706, 978, 486, 216, 655, 732, 1369, 859, 0, 851] | 0 | 0 |
| BsmtFinType2 | object | Categorical | 6 | [Unf, BLQ, nan, ALQ, Rec, LwQ, GLQ] | 38 | 2.60274 |
| BsmtFinSF2 | int64 | Numerical | 144 | [0, 32, 668, 486, 93, 491, 506, 712, 362, 41] | 0 | 0 |
| BsmtUnfSF | int64 | Numerical | 780 | [150, 284, 434, 540, 490, 64, 317, 216, 952, 140] | 0 | 0 |
| TotalBsmtSF | int64 | Numerical | 721 | [856, 1262, 920, 756, 1145, 796, 1686, 1107, 9... | 0 | 0 |
| Heating | object | Categorical | 6 | [GasA, GasW, Grav, Wall, OthW, Floor] | 0 | 0 |
| HeatingQC | object | Categorical | 5 | [Ex, Gd, TA, Fa, Po] | 0 | 0 |
| CentralAir | object | Categorical | 2 | [Y, N] | 0 | 0 |
| Electrical | object | Categorical | 5 | [SBrkr, FuseF, FuseA, FuseP, Mix, nan] | 1 | 0.0684932 |
| 1stFlrSF | int64 | Numerical | 753 | [856, 1262, 920, 961, 1145, 796, 1694, 1107, 1... | 0 | 0 |
| 2ndFlrSF | int64 | Numerical | 417 | [854, 0, 866, 756, 1053, 566, 983, 752, 1142, ... | 0 | 0 |
| LowQualFinSF | int64 | Numerical | 24 | [0, 360, 513, 234, 528, 572, 144, 392, 371, 390] | 0 | 0 |
| GrLivArea | int64 | Numerical | 861 | [1710, 1262, 1786, 1717, 2198, 1362, 1694, 209... | 0 | 0 |
| BsmtFullBath | int64 | Numerical | 4 | [1, 0, 2, 3] | 0 | 0 |
| BsmtHalfBath | int64 | Numerical | 3 | [0, 1, 2] | 0 | 0 |
| FullBath | int64 | Numerical | 4 | [2, 1, 3, 0] | 0 | 0 |
| HalfBath | int64 | Numerical | 3 | [1, 0, 2] | 0 | 0 |
| BedroomAbvGr | int64 | Numerical | 8 | [3, 4, 1, 2, 0, 5, 6, 8] | 0 | 0 |
| KitchenAbvGr | int64 | Numerical | 4 | [1, 2, 3, 0] | 0 | 0 |
| KitchenQual | object | Categorical | 4 | [Gd, TA, Ex, Fa] | 0 | 0 |
| TotRmsAbvGrd | int64 | Numerical | 12 | [8, 6, 7, 9, 5, 11, 4, 10, 12, 3] | 0 | 0 |
| Functional | object | Categorical | 7 | [Typ, Min1, Maj1, Min2, Mod, Maj2, Sev] | 0 | 0 |
| Fireplaces | int64 | Numerical | 4 | [0, 1, 2, 3] | 0 | 0 |
| FireplaceQu | object | Categorical | 5 | [nan, TA, Gd, Fa, Ex, Po] | 690 | 47.2603 |
| GarageType | object | Categorical | 6 | [Attchd, Detchd, BuiltIn, CarPort, nan, Basmen... | 81 | 5.54795 |
| GarageYrBlt | float64 | Numerical | 97 | [2003.0, 1976.0, 2001.0, 1998.0, 2000.0, 1993.... | 81 | 5.54795 |
| GarageFinish | object | Categorical | 3 | [RFn, Unf, Fin, nan] | 81 | 5.54795 |
| GarageCars | int64 | Numerical | 5 | [2, 3, 1, 0, 4] | 0 | 0 |
| GarageArea | int64 | Numerical | 441 | [548, 460, 608, 642, 836, 480, 636, 484, 468, ... | 0 | 0 |
| GarageQual | object | Categorical | 5 | [TA, Fa, Gd, nan, Ex, Po] | 81 | 5.54795 |
| GarageCond | object | Categorical | 5 | [TA, Fa, nan, Gd, Po, Ex] | 81 | 5.54795 |
| PavedDrive | object | Categorical | 3 | [Y, N, P] | 0 | 0 |
| WoodDeckSF | int64 | Numerical | 274 | [0, 298, 192, 40, 255, 235, 90, 147, 140, 160] | 0 | 0 |
| OpenPorchSF | int64 | Numerical | 202 | [61, 0, 42, 35, 84, 30, 57, 204, 4, 21] | 0 | 0 |
| EnclosedPorch | int64 | Numerical | 120 | [0, 272, 228, 205, 176, 87, 172, 102, 37, 144] | 0 | 0 |
| 3SsnPorch | int64 | Numerical | 20 | [0, 320, 407, 130, 180, 168, 140, 508, 238, 245] | 0 | 0 |
| ScreenPorch | int64 | Numerical | 76 | [0, 176, 198, 291, 252, 99, 184, 168, 130, 142] | 0 | 0 |
| PoolArea | int64 | Numerical | 8 | [0, 512, 648, 576, 555, 480, 519, 738] | 0 | 0 |
| PoolQC | object | Categorical | 3 | [nan, Ex, Fa, Gd] | 1453 | 99.5205 |
| Fence | object | Categorical | 4 | [nan, MnPrv, GdWo, GdPrv, MnWw] | 1179 | 80.7534 |
| MiscFeature | object | Categorical | 4 | [nan, Shed, Gar2, Othr, TenC] | 1406 | 96.3014 |
| MiscVal | int64 | Numerical | 21 | [0, 700, 350, 500, 400, 480, 450, 15500, 1200,... | 0 | 0 |
| MoSold | int64 | Numerical | 12 | [2, 5, 9, 12, 10, 8, 11, 4, 1, 7] | 0 | 0 |
| YrSold | int64 | Numerical | 5 | [2008, 2007, 2006, 2009, 2010] | 0 | 0 |
| SaleType | object | Categorical | 9 | [WD, New, COD, ConLD, ConLI, CWD, ConLw, Con, ... | 0 | 0 |
| SaleCondition | object | Categorical | 6 | [Normal, Abnorml, Partial, AdjLand, Alloca, Fa... | 0 | 0 |
| SalePrice | int64 | Numerical | 663 | [208500, 181500, 223500, 140000, 250000, 14300... | 0 | 0 |
| data_type_grp | Categorical | Numerical | Total |
|---|---|---|---|
| cardinality_bin | |||
| (0 -- 10] | 40 | 12 | 52 |
| (10 -- 20] | 2 | 4 | 6 |
| (100 -- 200] | 0 | 4 | 4 |
| (20 -- 30] | 1 | 2 | 3 |
| (200 -- 500] | 0 | 5 | 5 |
| (500 -- 1000] | 0 | 6 | 6 |
| (90 -- 100] | 0 | 3 | 3 |
| 1000+ | 0 | 2 | 2 |
| Total | 43 | 38 | 81 |
i. Variables WITHOUT missing values
| data_type_grp | Categorical | Numerical | Total Non Missing |
|---|---|---|---|
| missing_bin | |||
| 0 | 27 | 35 | 62 |
| Total Non Missing | 27 | 35 | 62 |
ii. Variables WITH missing values
| data_type_grp | Categorical | Numerical | Total Missing |
|---|---|---|---|
| missing_bin | |||
| (0 -- 10] | 11 | 2 | 13 |
| (10 -- 20] | 0 | 1 | 1 |
| (40 -- 50] | 1 | 0 | 1 |
| (80 -- 90] | 1 | 0 | 1 |
| (90 -- 100] | 3 | 0 | 3 |
| Total Missing | 16 | 3 | 19 |
| count | mean | std | 0% | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Id | 1460.0 | 730.5 | 421.610009 | 1.0 | 146.9 | 292.8 | 438.7 | 584.6 | 730.5 | 876.4 | 1022.3 | 1168.2 | 1314.1 | 1460.0 |
Number of Missing Values: 0 Percentage : 0.0 %
| lower_bound_outliers | upper_bound_outliers | total_outliers | perc_outliers | |
|---|---|---|---|---|
| Id | 0 | 0 | 0 | 0.0 |
| count | mean | std | 0% | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LotFrontage | 1201.0 | 70.049958 | 24.284752 | 21.0 | 44.0 | 53.0 | 60.0 | 63.0 | 69.0 | 74.0 | 79.0 | 85.0 | 96.0 | 313.0 |
Number of Missing Values: 259 Percentage : 18.0 %
| lower_bound_outliers | upper_bound_outliers | total_outliers | perc_outliers | |
|---|---|---|---|---|
| LotFrontage | 42 | 46 | 88 | 6.0 |
| count | mean | std | 0% | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LotArea | 1460.0 | 10516.828082 | 9981.264932 | 1300.0 | 5000.0 | 7078.4 | 8063.7 | 8793.4 | 9478.5 | 10198.2 | 11066.5 | 12205.8 | 14381.7 | 215245.0 |
Number of Missing Values: 0 Percentage : 0.0 %
| lower_bound_outliers | upper_bound_outliers | total_outliers | perc_outliers | |
|---|---|---|---|---|
| LotArea | 2 | 67 | 69 | 5.0 |
| count | mean | std | 0% | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| YearBuilt | 1460.0 | 1971.267808 | 30.202904 | 1872.0 | 1924.9 | 1947.8 | 1958.0 | 1965.0 | 1973.0 | 1984.0 | 1997.3 | 2003.0 | 2006.0 | 2010.0 |
Number of Missing Values: 0 Percentage : 0.0 %
| lower_bound_outliers | upper_bound_outliers | total_outliers | perc_outliers | |
|---|---|---|---|---|
| YearBuilt | 7 | 0 | 7 | 0.0 |
| count | mean | std | 0% | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| YearRemodAdd | 1460.0 | 1984.865753 | 20.645407 | 1950.0 | 1950.0 | 1961.8 | 1971.0 | 1980.0 | 1994.0 | 1998.0 | 2002.0 | 2005.0 | 2006.0 | 2010.0 |
Number of Missing Values: 0 Percentage : 0.0 %
| lower_bound_outliers | upper_bound_outliers | total_outliers | perc_outliers | |
|---|---|---|---|---|
| YearRemodAdd | 0 | 0 | 0 | 0.0 |
| count | mean | std | 0% | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MasVnrArea | 1452.0 | 103.685262 | 181.066207 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 16.0 | 117.0 | 206.0 | 335.0 | 1600.0 |
Number of Missing Values: 8 Percentage : 1.0 %
| lower_bound_outliers | upper_bound_outliers | total_outliers | perc_outliers | |
|---|---|---|---|---|
| MasVnrArea | 0 | 96 | 96 | 7.0 |
| count | mean | std | 0% | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BsmtFinSF1 | 1460.0 | 443.639726 | 456.098091 | 0.0 | 0.0 | 0.0 | 0.0 | 218.6 | 383.5 | 525.6 | 655.0 | 806.4 | 1065.5 | 5644.0 |
Number of Missing Values: 0 Percentage : 0.0 %
| lower_bound_outliers | upper_bound_outliers | total_outliers | perc_outliers | |
|---|---|---|---|---|
| BsmtFinSF1 | 0 | 7 | 7 | 0.0 |
| count | mean | std | 0% | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BsmtFinSF2 | 1460.0 | 46.549315 | 161.319273 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 117.2 | 1474.0 |
Number of Missing Values: 0 Percentage : 0.0 %
| lower_bound_outliers | upper_bound_outliers | total_outliers | perc_outliers | |
|---|---|---|---|---|
| BsmtFinSF2 | 0 | 167 | 167 | 11.0 |
| count | mean | std | 0% | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BsmtUnfSF | 1460.0 | 567.240411 | 441.866955 | 0.0 | 74.9 | 172.0 | 280.0 | 374.6 | 477.5 | 604.4 | 736.0 | 896.0 | 1232.0 | 2336.0 |
Number of Missing Values: 0 Percentage : 0.0 %
| lower_bound_outliers | upper_bound_outliers | total_outliers | perc_outliers | |
|---|---|---|---|---|
| BsmtUnfSF | 0 | 29 | 29 | 2.0 |
| count | mean | std | 0% | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TotalBsmtSF | 1460.0 | 1057.429452 | 438.705324 | 0.0 | 636.9 | 755.8 | 840.0 | 910.0 | 991.5 | 1088.0 | 1216.0 | 1391.2 | 1602.2 | 6110.0 |
Number of Missing Values: 0 Percentage : 0.0 %
| lower_bound_outliers | upper_bound_outliers | total_outliers | perc_outliers | |
|---|---|---|---|---|
| TotalBsmtSF | 37 | 24 | 61 | 4.0 |
| count | mean | std | 0% | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1stFlrSF | 1460.0 | 1162.626712 | 386.587738 | 334.0 | 756.9 | 848.0 | 915.7 | 1000.2 | 1087.0 | 1182.0 | 1314.0 | 1482.4 | 1680.0 | 4692.0 |
Number of Missing Values: 0 Percentage : 0.0 %
| lower_bound_outliers | upper_bound_outliers | total_outliers | perc_outliers | |
|---|---|---|---|---|
| 1stFlrSF | 0 | 20 | 20 | 1.0 |
| count | mean | std | 0% | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2ndFlrSF | 1460.0 | 346.992466 | 436.528436 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 455.4 | 672.0 | 796.2 | 954.2 | 2065.0 |
Number of Missing Values: 0 Percentage : 0.0 %
| lower_bound_outliers | upper_bound_outliers | total_outliers | perc_outliers | |
|---|---|---|---|---|
| 2ndFlrSF | 0 | 2 | 2 | 0.0 |
| count | mean | std | 0% | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GrLivArea | 1460.0 | 1515.463699 | 525.480383 | 334.0 | 912.0 | 1066.6 | 1208.0 | 1339.0 | 1464.0 | 1578.0 | 1709.3 | 1869.0 | 2158.3 | 5642.0 |
Number of Missing Values: 0 Percentage : 0.0 %
| lower_bound_outliers | upper_bound_outliers | total_outliers | perc_outliers | |
|---|---|---|---|---|
| GrLivArea | 0 | 31 | 31 | 2.0 |
| count | mean | std | 0% | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GarageYrBlt | 1379.0 | 1978.506164 | 24.689725 | 1900.0 | 1945.0 | 1957.0 | 1965.0 | 1973.0 | 1980.0 | 1993.0 | 1999.0 | 2004.0 | 2006.0 | 2010.0 |
Number of Missing Values: 81 Percentage : 6.0 %
| lower_bound_outliers | upper_bound_outliers | total_outliers | perc_outliers | |
|---|---|---|---|---|
| GarageYrBlt | 0 | 0 | 0 | 0.0 |
| count | mean | std | 0% | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GarageArea | 1460.0 | 472.980137 | 213.804841 | 0.0 | 240.0 | 295.6 | 384.0 | 440.0 | 480.0 | 516.0 | 560.0 | 620.2 | 757.1 | 1418.0 |
Number of Missing Values: 0 Percentage : 0.0 %
| lower_bound_outliers | upper_bound_outliers | total_outliers | perc_outliers | |
|---|---|---|---|---|
| GarageArea | 0 | 21 | 21 | 1.0 |
| count | mean | std | 0% | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| WoodDeckSF | 1460.0 | 94.244521 | 125.338794 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 100.0 | 144.0 | 192.0 | 262.0 | 857.0 |
Number of Missing Values: 0 Percentage : 0.0 %
| lower_bound_outliers | upper_bound_outliers | total_outliers | perc_outliers | |
|---|---|---|---|---|
| WoodDeckSF | 0 | 32 | 32 | 2.0 |
| count | mean | std | 0% | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| OpenPorchSF | 1460.0 | 46.660274 | 66.256028 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 25.0 | 40.0 | 57.0 | 83.2 | 130.0 | 547.0 |
Number of Missing Values: 0 Percentage : 0.0 %
| lower_bound_outliers | upper_bound_outliers | total_outliers | perc_outliers | |
|---|---|---|---|---|
| OpenPorchSF | 0 | 77 | 77 | 5.0 |
| count | mean | std | 0% | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| EnclosedPorch | 1460.0 | 21.95411 | 61.119149 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 112.0 | 552.0 |
Number of Missing Values: 0 Percentage : 0.0 %
| lower_bound_outliers | upper_bound_outliers | total_outliers | perc_outliers | |
|---|---|---|---|---|
| EnclosedPorch | 0 | 208 | 208 | 14.0 |
| count | mean | std | 0% | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ScreenPorch | 1460.0 | 15.060959 | 55.757415 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 480.0 |
Number of Missing Values: 0 Percentage : 0.0 %
| lower_bound_outliers | upper_bound_outliers | total_outliers | perc_outliers | |
|---|---|---|---|---|
| ScreenPorch | 0 | 116 | 116 | 8.0 |
Number of unique categories in MSZoning: 5 Let's look at some categories of MSZoning: ['RL' 'RM' 'C (all)' 'FV' 'RH']
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in Street: 2 Let's look at some categories of Street: ['Pave' 'Grvl']
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in Alley: 2 Let's look at some categories of Alley: [nan 'Grvl' 'Pave']
Number of Missing Values: 1369 Percentage : 94.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in LotShape: 4 Let's look at some categories of LotShape: ['Reg' 'IR1' 'IR2' 'IR3']
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in LandContour: 4 Let's look at some categories of LandContour: ['Lvl' 'Bnk' 'Low' 'HLS']
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in Utilities: 2 Let's look at some categories of Utilities: ['AllPub' 'NoSeWa']
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in LotConfig: 5 Let's look at some categories of LotConfig: ['Inside' 'FR2' 'Corner' 'CulDSac' 'FR3']
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in LandSlope: 3 Let's look at some categories of LandSlope: ['Gtl' 'Mod' 'Sev']
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in Neighborhood: 25 Let's look at some categories of Neighborhood: ['CollgCr' 'Veenker' 'Crawfor' 'NoRidge' 'Mitchel' 'Somerst' 'NWAmes' 'OldTown' 'BrkSide' 'Sawyer' 'NridgHt' 'NAmes' 'SawyerW' 'IDOTRR' 'MeadowV' 'Edwards' 'Timber' 'Gilbert' 'StoneBr' 'ClearCr']
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in Condition1: 9 Let's look at some categories of Condition1: ['Norm' 'Feedr' 'PosN' 'Artery' 'RRAe' 'RRNn' 'RRAn' 'PosA' 'RRNe']
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in Condition2: 8 Let's look at some categories of Condition2: ['Norm' 'Artery' 'RRNn' 'Feedr' 'PosN' 'PosA' 'RRAn' 'RRAe']
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in BldgType: 5 Let's look at some categories of BldgType: ['1Fam' '2fmCon' 'Duplex' 'TwnhsE' 'Twnhs']
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in HouseStyle: 8 Let's look at some categories of HouseStyle: ['2Story' '1Story' '1.5Fin' '1.5Unf' 'SFoyer' 'SLvl' '2.5Unf' '2.5Fin']
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in RoofStyle: 6 Let's look at some categories of RoofStyle: ['Gable' 'Hip' 'Gambrel' 'Mansard' 'Flat' 'Shed']
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in RoofMatl: 8 Let's look at some categories of RoofMatl: ['CompShg' 'WdShngl' 'Metal' 'WdShake' 'Membran' 'Tar&Grv' 'Roll' 'ClyTile']
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in Exterior1st: 15 Let's look at some categories of Exterior1st: ['VinylSd' 'MetalSd' 'Wd Sdng' 'HdBoard' 'BrkFace' 'WdShing' 'CemntBd' 'Plywood' 'AsbShng' 'Stucco' 'BrkComm' 'AsphShn' 'Stone' 'ImStucc' 'CBlock']
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in Exterior2nd: 16 Let's look at some categories of Exterior2nd: ['VinylSd' 'MetalSd' 'Wd Shng' 'HdBoard' 'Plywood' 'Wd Sdng' 'CmentBd' 'BrkFace' 'Stucco' 'AsbShng' 'Brk Cmn' 'ImStucc' 'AsphShn' 'Stone' 'Other' 'CBlock']
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in MasVnrType: 4 Let's look at some categories of MasVnrType: ['BrkFace' 'None' 'Stone' 'BrkCmn' nan]
Number of Missing Values: 8 Percentage : 1.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in ExterQual: 4 Let's look at some categories of ExterQual: ['Gd' 'TA' 'Ex' 'Fa']
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in ExterCond: 5 Let's look at some categories of ExterCond: ['TA' 'Gd' 'Fa' 'Po' 'Ex']
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in Foundation: 6 Let's look at some categories of Foundation: ['PConc' 'CBlock' 'BrkTil' 'Wood' 'Slab' 'Stone']
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in BsmtQual: 4 Let's look at some categories of BsmtQual: ['Gd' 'TA' 'Ex' nan 'Fa']
Number of Missing Values: 37 Percentage : 3.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in BsmtCond: 4 Let's look at some categories of BsmtCond: ['TA' 'Gd' nan 'Fa' 'Po']
Number of Missing Values: 37 Percentage : 3.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in BsmtExposure: 4 Let's look at some categories of BsmtExposure: ['No' 'Gd' 'Mn' 'Av' nan]
Number of Missing Values: 38 Percentage : 3.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in BsmtFinType1: 6 Let's look at some categories of BsmtFinType1: ['GLQ' 'ALQ' 'Unf' 'Rec' 'BLQ' nan 'LwQ']
Number of Missing Values: 37 Percentage : 3.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in BsmtFinType2: 6 Let's look at some categories of BsmtFinType2: ['Unf' 'BLQ' nan 'ALQ' 'Rec' 'LwQ' 'GLQ']
Number of Missing Values: 38 Percentage : 3.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in Heating: 6 Let's look at some categories of Heating: ['GasA' 'GasW' 'Grav' 'Wall' 'OthW' 'Floor']
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in HeatingQC: 5 Let's look at some categories of HeatingQC: ['Ex' 'Gd' 'TA' 'Fa' 'Po']
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in CentralAir: 2 Let's look at some categories of CentralAir: ['Y' 'N']
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in Electrical: 5 Let's look at some categories of Electrical: ['SBrkr' 'FuseF' 'FuseA' 'FuseP' 'Mix' nan]
Number of Missing Values: 1 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in KitchenQual: 4 Let's look at some categories of KitchenQual: ['Gd' 'TA' 'Ex' 'Fa']
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in Functional: 7 Let's look at some categories of Functional: ['Typ' 'Min1' 'Maj1' 'Min2' 'Mod' 'Maj2' 'Sev']
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in FireplaceQu: 5 Let's look at some categories of FireplaceQu: [nan 'TA' 'Gd' 'Fa' 'Ex' 'Po']
Number of Missing Values: 690 Percentage : 47.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in GarageType: 6 Let's look at some categories of GarageType: ['Attchd' 'Detchd' 'BuiltIn' 'CarPort' nan 'Basment' '2Types']
Number of Missing Values: 81 Percentage : 6.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in GarageFinish: 3 Let's look at some categories of GarageFinish: ['RFn' 'Unf' 'Fin' nan]
Number of Missing Values: 81 Percentage : 6.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in GarageQual: 5 Let's look at some categories of GarageQual: ['TA' 'Fa' 'Gd' nan 'Ex' 'Po']
Number of Missing Values: 81 Percentage : 6.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in GarageCond: 5 Let's look at some categories of GarageCond: ['TA' 'Fa' nan 'Gd' 'Po' 'Ex']
Number of Missing Values: 81 Percentage : 6.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in PavedDrive: 3 Let's look at some categories of PavedDrive: ['Y' 'N' 'P']
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in PoolQC: 3 Let's look at some categories of PoolQC: [nan 'Ex' 'Fa' 'Gd']
Number of Missing Values: 1453 Percentage : 100.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in Fence: 4 Let's look at some categories of Fence: [nan 'MnPrv' 'GdWo' 'GdPrv' 'MnWw']
Number of Missing Values: 1179 Percentage : 81.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in MiscFeature: 4 Let's look at some categories of MiscFeature: [nan 'Shed' 'Gar2' 'Othr' 'TenC']
Number of Missing Values: 1406 Percentage : 96.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in SaleType: 9 Let's look at some categories of SaleType: ['WD' 'New' 'COD' 'ConLD' 'ConLI' 'CWD' 'ConLw' 'Con' 'Oth']
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in SaleCondition: 6 Let's look at some categories of SaleCondition: ['Normal' 'Abnorml' 'Partial' 'AdjLand' 'Alloca' 'Family']
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in MSSubClass: 15 Let's look at some categories of MSSubClass: [ 60 20 70 50 190 45 90 120 30 85 80 160 75 180 40]
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in OverallQual: 10 Let's look at some categories of OverallQual: [ 7 6 8 5 9 4 10 3 1 2]
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in OverallCond: 9 Let's look at some categories of OverallCond: [5 8 6 7 4 2 3 9 1]
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in LowQualFinSF: 24 Let's look at some categories of LowQualFinSF: [ 0 360 513 234 528 572 144 392 371 390 420 473 156 515 80 53 232 481 120 514]
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in BsmtFullBath: 4 Let's look at some categories of BsmtFullBath: [1 0 2 3]
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in BsmtHalfBath: 3 Let's look at some categories of BsmtHalfBath: [0 1 2]
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in FullBath: 4 Let's look at some categories of FullBath: [2 1 3 0]
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in HalfBath: 3 Let's look at some categories of HalfBath: [1 0 2]
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in BedroomAbvGr: 8 Let's look at some categories of BedroomAbvGr: [3 4 1 2 0 5 6 8]
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in KitchenAbvGr: 4 Let's look at some categories of KitchenAbvGr: [1 2 3 0]
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in TotRmsAbvGrd: 12 Let's look at some categories of TotRmsAbvGrd: [ 8 6 7 9 5 11 4 10 12 3 2 14]
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in Fireplaces: 4 Let's look at some categories of Fireplaces: [0 1 2 3]
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in GarageCars: 5 Let's look at some categories of GarageCars: [2 3 1 0 4]
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in 3SsnPorch: 20 Let's look at some categories of 3SsnPorch: [ 0 320 407 130 180 168 140 508 238 245 196 144 182 162 23 216 96 153 290 304]
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in PoolArea: 8 Let's look at some categories of PoolArea: [ 0 512 648 576 555 480 519 738]
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in MiscVal: 21 Let's look at some categories of MiscVal: [ 0 700 350 500 400 480 450 15500 1200 800 2000 600 3500 1300 54 620 560 1400 8300 1150]
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in MoSold: 12 Let's look at some categories of MoSold: [ 2 5 9 12 10 8 11 4 1 7 3 6]
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label
Number of unique categories in YrSold: 5 Let's look at some categories of YrSold: [2008 2007 2006 2009 2010]
Number of Missing Values: 0 Percentage : 0.0 %
i. All the labels As Is
ii. After adding the RARE label
i. All the labels As Is
ii. After adding the RARE label